Data-bandwidth-aware Job Scheduling Techniques in Distributed Systems

نویسندگان

  • De-Yu Chen
  • Pangfeng Liu
  • Jan-Jan Wu
چکیده

This paper introduces techniques in scheduling jobs on a master/workers platform where the bandwidth is shared by all workers. The jobs are independent and each job requires a fixed amount of bandwidth to download input data before execution. The master can communicate with multiple workers simultaneously, provided that the bandwidth used by the master and the workers do not exceed their bandwidth limits. We proposed two models for this limited-bandwidth problem. If the data transfer cannot be interrupted, then we prove that the scheduling problem is NP-complete. Nevertheless we propose heuristic algorithms and experimentally test their performance. If the data transfer can be interrupted, we propose an algorithm that produces optimal makespan. The algorithm is based on a binary search on the completion time, and an efficient feasibility verification process for a given completion time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Replication-Based Scheduling in Cloud Computing Environment

Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...

متن کامل

Bandwidth-Aware Resource Management for Extreme Scale Systems

As systems scale towards exascale, many resources will become increasingly constrained. While some of these resources have historically been explicitly allocated, many, like network bandwidth, I/O bandwidth, or power, have not. As systems continue to evolve, we expect many such resources to become explicitly managed. This change will pose critical challenges to resource management and job sched...

متن کامل

A New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability

Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...

متن کامل

Green Energy-aware task scheduling using the DVFS technique in Cloud Computing

Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...

متن کامل

Replica-Aware Job Scheduling in Distributed Systems

This paper proposes an effective replica-aware scheduling algorithm for independent jobs in Grid and distributed systems. The proposed algorithm considers not only the execution time of jobs but also the location and transfer time of data and data replica that these jobs require. We propose a cost model to estimate the starting time and earliest completion time of a job and its associated data ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009